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HIGHLIGHTS 

• Two recent studies provide evidence that attending the class of a high-value-added teacher predicts 
higher-than-expected educational attainment, earnings, and other adult outcomes. 

• In one study, part of the impact of attending an effective classroom may have been attributable to 
small class size; in the other, part of the effect may be attributable to the effectiveness of the 
school. 

• Teacher value-added scores "fade out" over time: knowing that a student had a teacher with a high 
value-added score one year provides little information about how well that student will fare on 
achievement tests several years later. 

• The studies provide important new evidence on the significance of early classroom experience to 
later success. 

INTRODUCTION 

Proposals to evaluate teachers based on their "value-added" to student test scores generate intense 
debate. Underlying the debate are concerns about three factors: bias, precision, and relevance. Previous 
Carnegie Foundation briefs have detailed the reasons why the first two are significant concerns. 1 " But 
even if value-added scores were unbiased and reasonably precise, their usefulness for evaluating 
teaching would still depend on the third factor— their relevance to the aims of schooling. After all, 
helping children do well on an achievement tests is of little value in itself. The question is whether a test 
score gain in a given year of schooling represents growth in skills that matter over the long-term. 

My aim in the current brief is to consider a key aspect of the relevance of value-added scores: their 
predictive validity— whether teachers who produce high value-added on achievement tests also 
engender lasting cognitive and non-cognitive skills that help prepare their students for success in later 
life. 

One measure of the predictive validity of value-added is the extent to which it persists or "fades out" 
over time. At issue is whether elevated value-added scores displayed during the initial year persist in 
subsequent years. I review 10 studies of the persistence of value-added scores. In each case, researchers 
compute value-added scores in an "initial year" and then in subsequent years. All studies show that 
value-added scores tend to fade out over time. Five years after the initial year, it appears that 75 
percent to 100 percent of the initial impact has disappeared. 

One might infer from these results that attending the class of a teacher with high value-added in a given 
year has little consequence beyond that year. However, two careful, large-scale studies, reviewed in 
detail below, suggest that despite the lack of persistence of value-added on future test scores, one year 
of experience with a high-value-added teacher predicts higher rates of college attendance and adult 
earnings, as well as other important outcomes. While the effects are not large for individual students, 
they become substantial when they are aggregated over the students a teacher encounters. Moreover, 
the cumulative effects of a sequence of effective teachers may be substantial for an individual student. 

The seeming contradiction between the lack of persistence of value-added on achievement test scores 
and the significant impacts on adult outcomes frames an important puzzle for future research. One 
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possible explanation is that teachers who produce comparatively high gains on test scores are also 
effective in producing gains in other skills— deemed "non-cognitive" skills— that matter in the labor 
market and in other aspects of adult life. And one study suggests that high-value-added teachers are 
indeed comparatively effective in promoting high levels of effort, initiative, and classroom participation 
among their students. Another possibility is that teachers who produce high gains on test scores also 
produce high gains on deeper cognitive skills, such as reasoning and problem-solving, that current tests 
may not fully capture but that pay off in the labor market. 1 " 

Skeptics may argue that value-added scores in the initial year and estimates of later impacts share a 
bias. For example, it might be that high-value-added teachers work in particularly effective schools, and 
that students who attend these schools for sustained periods see not only high initial test scores but 
also favorable long-term effects. Despite efforts by researchers to identify and remove such biases, they 
cannot be entirely discounted. More specifically, Chetty et al. (2013) attribute students' labor market 
gains to the value-added of individual teachers, despite the fact that some of these gains may be 
attributable to attending an effective school. IV And teacher effects estimated by Chetty et al. (2011) 
appear to include the impact of reduced class size in addition to the impact of individual teacher skill. 

The subsequent sections of this brief consider in more detail (a) the size of the initial value-added 
effects; (b) the persistence of initial value-added; (c) reported impacts on adult outcomes; (d) potential 
explanations for these findings and suggestions for further research; and (e) implications for school 
practice. 

MAGNITUDE OF INITIAL VALUE-ADDED EFFECTS 

To calibrate the importance of teacher value-added for students' future outcomes, we need to think a 
bit about how much teachers vary in the value-added score for a single year. If these initial scores vary a 
great deal, they might also strongly predict later outcomes. But if they vary only a little, one would 
expect them to be of little use in predicting future outcomes. The consensus seems to be that attending 
the class of a teacher who is one standard deviation above her peers in value-added is associated with a 
gain in achievement of 10 to 15 percent of a standard deviation in student achievement. To make this 
meaningful, consider a comparatively "good" teacher— one who is in the 70 th percentile of the teacher 
distribution in value-added. A teacher one standard deviation below such a teacher is at about the 30 th 
percentile. Studies so far tell us that we can expect the "better" teacher's students to score about 6 
percentile points higher, on average, on a standardized achievement test than the students of the 
"worse" teacher. This is a difference between being at the 53 rd percentile and the 47 th percentile. While 
this impact may seem small, it would be quite significant when aggregated over all the students in a 
class, if it persisted and laid the basis for lasting differences in socially valued outcomes. Educational 
interventions that can produce an effect this large in one year are often regarded as quite successful. 

PERSISTENCE OF VALUE-ADDED 

I found 10 studies of the persistence of initial value-added over subsequent years, and these are listed in 
Table 1. These studies are based on computation of a teacher's value-added to the scores on tests taken 
one, two, or more years after a student has encountered that teacher. Looking at the table, consider 
again two teachers who differ by one standard deviation on value-added in the initial year. How much of 
this initial difference remains one year after? The lowest estimates suggest that only about 18-25 
percent of the initial difference persists. The most optimistic estimates are that 50 percent does. And 
the median is 24 percent, meaning that about a quarter of the initial value-added remains after one 
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year. Fewer studies follow students for more than one year, but those that do suggest that initial 
differences continue to fade, albeit perhaps at a slower rate. Two studies follow students for more than 
3 years after the initial year, and these suggest that 25 percent or less of the initial difference persists. 
The consensus seems to be that there is a substantial decay over time in value-added to future 
achievement test scores. 

Table 1: Persistence of Value-Added After Initial Year as Fraction of Value-Added During the Initial 
Year 


Study 

Sample 

Year 1 

Year 

2 

Year 

3 

Year 

>3 

Kinsler (2012) v 

N=689,641 students, grades 3-5, 1998- 
2005, in North Carolina 

.24 (math) 
.14 (reading) 




Master, Loeb, and 
Wycoff, 2014 

N=700,000 students, grades 3-8, 2005-2226 
in New York City 

.19 (math) 

.21 (language 
arts) 




McCaffrey et al. 
(2004) vi 

N=678, grades 3-5, large suburban district 

.25 

.15 

— 

— 

Lockwood et al. v " 

N=10,000, grades 1-5, large urban district 

.18 

.15 

.14 

.12 

Kane and Staiger 
(2008) viii 

97 pairs of teachers, grades 2-5, 
randomization to students to teachers 
within pairs 

.50 




Jacob, Lefgren, 
and Sims (2010) IX 

n=18,240, grades 4-15, mid-size Western 
District 

.20 




Rothstein (2010) x 

n=99,071, grades 3-5, North Carolina 
statewide 

.27 (math) 
.33 (reading) 




Measurement of 
Effective Teaching 
(2012) 

1811 teachers randomized within schools 
to student rosters, grades 4-8 in 6 school 
districts 

.45 




Chetty et al. 
(2012) 

10,992 students randomized to classes 
within 79 schools in Tennessee 




0 

Chetty et al. 
(2013) 

2.5 million children grades 3-8 in New York 

.50 

.40 

.20 

.20 


The researchers cited in Table 1 have offered several explanations for the apparent fade-out of value- 
added scores. One is that high value-added scores in the initial year reflect teacher efforts to "teach to 
the test" rather than to produce meaningful skills. While plausible in light of the results in Table 1, this 
finding would suggest that exposure to a teacher who produces high value-added would not increase a 
range of favorable long-term outcomes. It seems implausible that the teachers who are best at teaching 
to the test are also best at fostering more general skills. Testing this explanation requires us to ask 
whether exposure to a high-value-added teacher has long-term benefits, and that is the next topic of 
this brief. 

Second, it may be that tests taken in the initial year produce lasting gains on the content those tests 
measure but that tests in subsequent years measure different skills. For example, the initial test might 
measure a child's ability to add double-digit numbers, while the later test might assess the child's ability 
to multiply and divide fractions. Knowing how to add may only modestly predict knowing how to 
manipulate fractions. This explanation would seem, however, to imply that we would see less 
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persistence in math scores than in reading scores, because the skills required to master a series of 
mathematical skills are believed to change more rapidly over time than the skills required for reading. 
Yet, we see low persistence of value-added scores in reading as well as math/' 

A third explanation is that subsequent teachers may be constrained in their ability to capitalize on what 
earlier teachers have taught. This explanation would be plausible if a teacher's class were composed of a 
small subset of students who gained considerable skill during the prior year and a larger subset of 
students whose previous gains were modest. The teacher of such a class might be inclined to pitch the 
instructional level to the larger set of lower-achieving students, preventing those who had benefited 
from prior good teaching to advance quickly. This explanation seems to predict that high value-added in 
the initial year would not predict favorable long-term outcomes. 

LONG-TERM IMPACT OF VALUE-ADDED 

Two of the 10 studies listed in Table 2 followed students from their elementary school classrooms into 
adulthood, obtaining data on long-term outcomes, including college attendance and the quality of the 
college attended around age 20, earnings at age 28, the quality of the neighborhood of residence during 
adulthood, and teen parenthood. The two studies differ in their design, but taken together, they suggest 
that early classroom experience influences long-term outcomes and that teacher skill is a key source of 
these impacts. 

The first of these studies is based on a pioneering experiment in Tennessee that tested the impact of 
class size on student learning/" In this study, which covered a period in the 1980s, students in 80 schools 
were assigned at random to kindergarten and first- and second-grade classrooms that were designated 
as large or small. Teachers were also assigned to classrooms at random. Remarkably, a team of 
researchers was able to obtain extensive administrative data on the long-term outcomes of this 
experiment/ 1 " The random assignment of students to classrooms (within schools) provided a solid basis 
for assessing the long-term impacts of classroom membership. These are reported in Column 1 of Table 
2 below. 

Importantly for our purposes, these impacts are not necessarily the impact of teacher effectiveness 
alone. A classroom might be effective because of its small class size or because of random variation in 
peer composition. However ill-defined the differences, the beauty of this study is that researchers were 
able to characterize the distribution of classroom impacts. Column 1 indicates that two classrooms that 
differed by one standard deviation in value-added produced a student learning difference of nearly a 
one-third standard deviation. In practical terms, this is a difference of about 9 percentile points. More 
remarkably, the results imply that students attending the more effective classroom will have earned, on 
average, about $1,520 per year more as young adults than will students who had attended the less 
effective classroom. The authors were able to establish that only a part of this effect can be explained by 
class size or peer composition. Moreover, because random assignment of students to teachers occurred 
within schools, this effect cannot be attributed to the overall effectiveness of the school. It is instead 
attributable to the impact of the classroom assignment. 

The results in Column 1 suggest that the classroom to which students were assigned had an important 
influence on later outcomes. But they do not suggest that early gains on test scores help us explain 
those classroom effects. The next question the authors asked was whether classes that specifically 
boosted test scores were those that also improved long-term outcomes. The answer was that they did. 
Attending a classroom that increased test scores by one standard deviation (about 8.8 percentile points) 
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during the initial year of the experiment was reported to increase college attendance, the quality of the 
college attended (as indicated by the mean adult earnings of its graduates), and earnings. The impact on 
college attendance was small (just over a quarter of one percentage point in a sample of whom 45.5 
percent attended college) as was the impact on college quality. However, the earnings impact of $1,619 
per year is large when aggregated over all of the students in the classroom. 

Three features of this study are notable. First, because it is based on random assignment of students, it 
is free of key sources of bias (see Note 1). Second, it does not establish the predictive validity of teacher 
value-added scores per se, but rather, the more global effect of attending a classroom that worked well 
for a variety of possible reasons, including its size. But the study does suggest that classrooms that 
produce test score gains also produce valuable long-term outcomes. Third, it compares classrooms 
within the same school. In contrast, value-added scores are typically computed for teachers who work in 
different schools— comparisons that pose special challenges to the validity of conclusions drawn. XIV 

The second study of long-term outcomes directly addressed the question of teacher value-added by 
comparing teachers who work in different schools/'' It used a sample of 2.5 million children attending 
New York City schools in grade 3-8, and it used administrative records to follow them into adulthood. 
This study was not based on random assignment of children to classrooms. However, authors took care 
to identify and control for the sources of bias that might arise when it is not possible to conduct a 
randomized experiment. 

The results of this study (Column 3 of Table 2) in some ways mirror those of the Tennessee experiment 
(Columns 1 and 2). We see small but statistically significant effects of teacher value-added on college 
attendance and college quality. We see a statistically significant but much smaller impact on earnings 
(having a teacher with one standard deviation higher value-added predicted earning $350 per year more 
than expected at age 28). The authors also reported a reduction in teenage parenthood and increases in 
neighborhood socioeconomic status (as measured by the fraction of neighbors with a college education) 
and savings (as indicated by having a 401k retirement account). These, too, were small but statistically 
significant effects. 


Table 2: Impacts of Value-Added on Adult Outcomes 



Impact of classroom 
quality overall (Chetty 
etal. 2011) 

Impact of classroom value- 
added (Chetty et al. 2011) 

Impact of teacher 
value-added (Chetty et 
al. 2013) 

Initial test scores 

8.8 percentiles (.32 sd) 



College Attendance 


0.28% above mean of 
45.5% 

0.82% above mean of 
37.22% 

College Quality index 


0.06 sd 

0.02 sd 

Earnings 

$1520 (8.8% above 
mean) 

$1619 (11.1% above mean) 

$350 (1.65% above 
mean) 

Teen parenthood 



0.61% below mean of 
14.3% 

Other outcomes 



Increases in 
neighborhood quality, 
saving with 401K 


The authors devised an ingenious test of the validity of these findings. They asked whether a cohort of 
children in a particular school in a particular grade achieved less the year after a high value-added 
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teacher left than did the previous cohort of students, and, conversely, whether children gained more the 
year after a high-value-added teacher joined the staff. Their findings essentially replicated the results 
shown in Table 2 (Column 3) for college attendance and college quality; however, findings regarding 
earnings were too imprecise to test impacts. A key assumption in this analysis is that the movement in 
and out of schools by effective (or ineffective) teachers is a cause of subsequent student achievement 
more than a result of past school effectiveness. This research strategy is ingenious, yet we cannot rule 
out the possibility that at least part of the impact that the authors ascribe to teachers is actually 
generated by an effective school. I say this because the authors did not control for the school a child 
attended when they assessed the association between teacher value-added and long-term outcomes. 
(See Note 11). 

This remarkable study suggests that teacher value-added has long-term consequences for children. 
Although the results are non-experimental, they are consistent with the experimental findings (Columns 
2 and 3 of Table 2) concerning the impacts of effective classrooms. Taking these studies together, I 
conclude that classroom effectiveness in the early grades and, more specifically, teacher effectiveness as 
indicated by value-added have some influence on important life outcomes. 

QUESTIONS FOR FUTURE RESEARCH 

The research reviewed here suggests that teacher value-added explains a modest, but not negligible, 
fraction of variation in student test scores during the initial year. However, the effects of a teacher on 
later test scores are much smaller, and most of the initial effect on test scores has faded out after three 
years. Despite the failure of value-added impacts to persist with respect to test scores, the research 
reviewed here has established that indicators of classroom effectiveness predict a range of adult 
outcomes. How can we reconcile the lack of persistence of impacts on test scores with the later 
emergence of impacts on important life outcomes? 

Several explanations for the fade-out of test score effects fail to account for the emergence of later 
outcomes. Teaching to the test might account for ephemeral effects on test scores but can hardly 
account for long-term benefits. The same is true of explanations that emphasize the inability of later 
teachers to capitalize on the gains produced by effective early teachers. The possibility that later tests 
measure skills not captured by earlier tests remains somewhat plausible. 

The single explanation posed by the authors of the long-term studies reviewed here is that teachers who 
are effective at producing initial gains in test scores are also effective in producing gains in non-cognitive 
or "soft skills." This is the same explanation that researchers have drawn regarding the long-term effects 
of several experimental early childhood interventions. XVI These interventions showed early impacts on 
test scores that faded completely over the next several years. Nevertheless, they produced favorable 
outcomes over the life course. Evidence suggests that the effects of early and sustained interventions on 
non-academic skills accounts at least in part for these long-term benefits. 

Chetty et al. (2013) tested the impact of teacher value-added on an index of non-academic skills, 
including measures of initiative, effort, and collaboration. They found that teachers who produced high 
initial value-added on test scores also produced favorable non-academic skills and that these were 
correlated with the adult outcomes of interest. These findings are consistent with the notion that 
teacher impacts on non-academic skills may help us understand the puzzle of fading test score effects 
and the emergence of long-term impacts. 
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I would urge caution, however, in inferring that the skill gains not measured by later achievement tests 
are all "non-cognitive." At best, achievement tests capture some important aspects of cognitive skills 
needed in the labor market. It is quite plausible that teachers who are effective at producing gains on a 
given test are also good at producing gains in deeper cognitive skills not captured by standardized tests. 

Considerably more research is needed on how specific aspects of teaching contribute to the range of 
skills that pay off in adult life. The research reviewed here has shown that it is possible to trace long- 
term effects of classroom experience, so we can anticipate more studies of this type. I would encourage 
researchers to think about the connections between academic learning in various subjects and the 
development of non-academic skills such as effort, initiative, persistence, and collaboration. In school 
settings, much of what we ask of students concerns academic learning. Reasoning and problem-solving 
skills appear ever more important, and substantial effort, initiative, and collaboration are likely essential 
for developing these skills. At the same time, it is likely that success in developing academic skills 
reinforces determination, effort, and initiative. In sum, it seems likely that academic and non-academic 
skills that matter in the labor market develop together and are mutually reinforcing. 

The researchers cited above note that achievement tests may poorly reflect these non-academic skills. I 
would emphasize that these tests may also fail to capture key cognitive skills, and, in particular, 
reasoning and problem-solving. In sum, it seems that we need a theory to explain how effective teaching 
fosters a range of skills and dispositions that, together, shape prospects for future success. It will take 
more long-term studies of the impact of teaching to test such theories. Moreover, the evidence that 
teachers vary in the extent to which initial value-added persists 1 " 1 ' suggests the need to assess the 
impact of teachers and schools on a wide range of outcomes. 

IMPLICATIONS FOR POLICY AND PRACTICE 

Teacher value-added scores, computed with care, should be taken seriously because these scores serve 
as meaningful signals of long-term benefit to students. The caveat "computed with care" is important. 
The researchers cited here took great care to identify and control for potential sources of bias. At the 
same time, teacher value-added scores are not precise. As a result, even those who advocate using 
value-added in teacher evaluation emphasize the importance of combining value-added with data from 
other measures of classroom effectiveness. 

The research suggests another way that we can and should enrich data on effective teaching: examining 
the value that teachers add to outcomes other than standardized test scores. The evidence seems to 
suggest that teacher effectiveness contributes to long-term outcomes in ways that are imperfectly 
captured by test scores. Effective teachers likely assist their students by producing a range of skills that 
support later success. Many school districts already have data that can help them assess teacher 
contributions to achievement in later grades, course-taking, high school graduation, and even college 
attendance and completion. We can thus see potential for policymakers, practitioners, and researchers 
to collaborate in constructing a richer set of effectiveness indicators so we can better appreciate the 
impact of teaching. 
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